{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Copy of Copy of object-detection.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "toc_visible": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "TZSL793i7KuM"
      },
      "source": [
        "# Setup\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "GmloKE6Qx8Lu"
      },
      "source": [
        "credentials = {\n",
        "  \"bucket\": \"BUCKET\",\n",
        "  \"access_key_id\": \"ACCESS_KEY_ID\",\n",
        "  \"secret_access_key\": \"SECRET_ACCESS_KEY\",\n",
        "  \"endpoint_url\": \"ENDPOINT_URL\"\n",
        "}"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "hVPzEKoLuEHy"
      },
      "source": [
        "NUM_TRAIN_STEPS = 500\n",
        "MODEL_TYPE = 'ssd_mobilenet_v1_quantized_300x300_coco14_sync_2018_07_18'\n",
        "CONFIG_TYPE = 'ssd_mobilenet_v1_quantized_300x300_coco14_sync'\n",
        "\n",
        "import os\n",
        "CLOUD_ANNOTATIONS_MOUNT = os.path.join('/content', credentials['bucket'])\n",
        "ANNOTATIONS_JSON_PATH   = os.path.join(CLOUD_ANNOTATIONS_MOUNT, '_annotations.json')\n",
        "\n",
        "CHECKPOINT_PATH = '/content/checkpoint'\n",
        "OUTPUT_PATH     = '/content/output'\n",
        "EXPORTED_PATH   = '/content/exported'\n",
        "DATA_PATH       = '/content/data'\n",
        "\n",
        "LABEL_MAP_PATH    = os.path.join(DATA_PATH, 'label_map.pbtxt')\n",
        "TRAIN_RECORD_PATH = os.path.join(DATA_PATH, 'train.record')\n",
        "VAL_RECORD_PATH   = os.path.join(DATA_PATH, 'val.record')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3XINCKkPshgz"
      },
      "source": [
        "# Install the TensorFlow Object Detection API\n",
        "In order to use the TensorFlow Object Detection API, we need to clone it's GitHub Repo.\n",
        "\n",
        "### Dependencies\n",
        "Most of the dependencies required come preloaded in Google Colab. The only additional package we need to install is TensorFlow.js, which is used for converting our trained model to a model that is compatible for the web.\n",
        "\n",
        "### Protocol Buffers\n",
        "The TensorFlow Object Detection API relies on what are called `protocol buffers` (also known as `protobufs`). Protobufs are a language neutral way to describe information. That means you can write a protobuf once and then compile it to be used with other languages, like Python, Java or C.\n",
        "\n",
        "The `protoc` command used below is compiling all the protocol buffers in the `object_detection/protos` folder for Python.\n",
        "\n",
        "### Environment\n",
        "To use the object detection api we need to add it to our `PYTHONPATH` along with `slim` which contains code for training and evaluating several widely used Convolutional Neural Network (CNN) image classification models."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "o33_jgwGm3NV"
      },
      "source": [
        "%tensorflow_version 1.x\n",
        "import os\n",
        "import pathlib\n",
        "\n",
        "# Clone the tensorflow models repository if it doesn't already exist\n",
        "if \"models\" in pathlib.Path.cwd().parts:\n",
        "  while \"models\" in pathlib.Path.cwd().parts:\n",
        "    os.chdir('..')\n",
        "elif not pathlib.Path('models').exists():\n",
        "  !git clone --depth 1 https://github.com/cloud-annotations/models\n",
        "\n",
        "!pip install cloud-annotations==0.0.4\n",
        "!pip install tf_slim\n",
        "!pip install lvis\n",
        "!pip install --no-deps tensorflowjs==1.4.0\n",
        "\n",
        "%cd /content/models/research\n",
        "!protoc object_detection/protos/*.proto --python_out=.\n",
        "\n",
        "pwd = os.getcwd()\n",
        "os.environ['PYTHONPATH'] += f':{pwd}:{pwd}/slim'"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wS1ZDbJ660Wv"
      },
      "source": [
        "# Test the setup\n",
        "If everything was set up properly and nothing went wrong, we should be able to run this command."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "iM8sOHwL64Rp"
      },
      "source": [
        "!python object_detection/builders/model_builder_tf1_test.py"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "G-DHE8xTssqT"
      },
      "source": [
        "# Mount Cloud Annotations Bucket\n",
        "In order to use files from Cloud Annotations we need to mount it to Colab."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "jydTH9gYQ3y7"
      },
      "source": [
        "import cloud_annotations as ca\n",
        "ca.mount(CLOUD_ANNOTATIONS_MOUNT, credentials)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "ISX8k0TfdDHj"
      },
      "source": [
        "# Generate a Label Map\n",
        "One piece of data the Object Detection API needs is a label map protobuf. The label map associates an integer id to the text representation of the label. The ids are indexed by 1, meaning the first label will have an id of 1 not 0.\n",
        "\n",
        "Here is an example of what a label map looks like:\n",
        "```\n",
        "item {\n",
        "  id: 1\n",
        "  name: 'Cat'\n",
        "}\n",
        "\n",
        "item {\n",
        "  id: 2\n",
        "  name: 'Dog'\n",
        "}\n",
        "\n",
        "item {\n",
        "  id: 3\n",
        "  name: 'Gold Fish'\n",
        "}\n",
        "```\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nJsKCG3UdDsn"
      },
      "source": [
        "import os\n",
        "import json\n",
        "\n",
        "# Get a list of labels from the annotations.json\n",
        "labels = {}\n",
        "with open(ANNOTATIONS_JSON_PATH) as f:\n",
        "  annotations = json.load(f)\n",
        "  labels = annotations['labels']\n",
        "\n",
        "# Create a file named label_map.pbtxt\n",
        "os.makedirs(DATA_PATH, exist_ok=True)\n",
        "with open(LABEL_MAP_PATH, 'w') as f:\n",
        "  # Loop through all of the labels and write each label to the file with an id\n",
        "  for idx, label in enumerate(labels):\n",
        "    f.write('item {\\n')\n",
        "    f.write(\"\\tname: '{}'\\n\".format(label))\n",
        "    f.write('\\tid: {}\\n'.format(idx + 1)) # indexes must start at 1\n",
        "    f.write('}\\n')"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "siRNKiuvsz25"
      },
      "source": [
        "# Generate TFRecords\n",
        "The TensorFlow Object Detection API expects our data to be in the format of TFRecords.\n",
        "\n",
        "The TFRecord format is a collection of serialized feature dicts, one for each image, looking something like this:\n",
        "```\n",
        "{\n",
        "  'image/height': 1800,\n",
        "  'image/width': 2400,\n",
        "  'image/filename': 'image1.jpg',\n",
        "  'image/source_id': 'image1.jpg',\n",
        "  'image/encoded': ACTUAL_ENCODED_IMAGE_DATA_AS_BYTES,\n",
        "  'image/format': 'jpeg',\n",
        "  'image/object/bbox/xmin': [0.7255949630314233, 0.8845598428835489],\n",
        "  'image/object/bbox/xmax': [0.9695875693160814, 1.0000000000000000],\n",
        "  'image/object/bbox/ymin': [0.5820120073891626, 0.1829972290640394],\n",
        "  'image/object/bbox/ymax': [1.0000000000000000, 0.9662484605911330],\n",
        "  'image/object/class/text': (['Cat', 'Dog']),\n",
        "  'image/object/class/label': ([1, 2])\n",
        "}\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cAkOvP-gZR1x"
      },
      "source": [
        "import os\n",
        "import io\n",
        "import json\n",
        "import random\n",
        "\n",
        "import PIL.Image\n",
        "import tensorflow as tf\n",
        "\n",
        "from object_detection.utils import dataset_util\n",
        "from object_detection.utils import label_map_util\n",
        "\n",
        "def create_tf_record(images, annotations, label_map, image_path, output):\n",
        "  # Create a train.record TFRecord file.\n",
        "  with tf.python_io.TFRecordWriter(output) as writer:\n",
        "    # Loop through all the training examples.\n",
        "    for image_name in images:\n",
        "      try:\n",
        "        # Make sure the image is actually a file\n",
        "        img_path = os.path.join(image_path, image_name)   \n",
        "        if not os.path.isfile(img_path):\n",
        "          continue\n",
        "          \n",
        "        # Read in the image.\n",
        "        with tf.gfile.GFile(img_path, 'rb') as fid:\n",
        "          encoded_jpg = fid.read()\n",
        "\n",
        "        # Open the image with PIL so we can check that it's a jpeg and get the image\n",
        "        # dimensions.\n",
        "        encoded_jpg_io = io.BytesIO(encoded_jpg)\n",
        "        image = PIL.Image.open(encoded_jpg_io)\n",
        "        if image.format != 'JPEG':\n",
        "          raise ValueError('Image format not JPEG')\n",
        "\n",
        "        width, height = image.size\n",
        "\n",
        "        # Initialize all the arrays.\n",
        "        xmins = []\n",
        "        xmaxs = []\n",
        "        ymins = []\n",
        "        ymaxs = []\n",
        "        classes_text = []\n",
        "        classes = []\n",
        "\n",
        "        # The class text is the label name and the class is the id. If there are 3\n",
        "        # cats in the image and 1 dog, it may look something like this:\n",
        "        # classes_text = ['Cat', 'Cat', 'Dog', 'Cat']\n",
        "        # classes      = [  1  ,   1  ,   2  ,   1  ]\n",
        "\n",
        "        # For each image, loop through all the annotations and append their values.\n",
        "        for a in annotations[image_name]:\n",
        "          if (\"x\" in a and \"x2\" in a and \"y\" in a and \"y2\" in a):\n",
        "            label = a['label']\n",
        "            xmins.append(a[\"x\"])\n",
        "            xmaxs.append(a[\"x2\"])\n",
        "            ymins.append(a[\"y\"])\n",
        "            ymaxs.append(a[\"y2\"])\n",
        "            classes_text.append(label.encode(\"utf8\"))\n",
        "            classes.append(label_map[label])\n",
        "       \n",
        "        # Create the TFExample.\n",
        "        tf_example = tf.train.Example(features=tf.train.Features(feature={\n",
        "          'image/height': dataset_util.int64_feature(height),\n",
        "          'image/width': dataset_util.int64_feature(width),\n",
        "          'image/filename': dataset_util.bytes_feature(image_name.encode('utf8')),\n",
        "          'image/source_id': dataset_util.bytes_feature(image_name.encode('utf8')),\n",
        "          'image/encoded': dataset_util.bytes_feature(encoded_jpg),\n",
        "          'image/format': dataset_util.bytes_feature('jpeg'.encode('utf8')),\n",
        "          'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),\n",
        "          'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),\n",
        "          'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),\n",
        "          'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),\n",
        "          'image/object/class/text': dataset_util.bytes_list_feature(classes_text),\n",
        "          'image/object/class/label': dataset_util.int64_list_feature(classes),\n",
        "        }))\n",
        "        if tf_example:\n",
        "          # Write the TFExample to the TFRecord.\n",
        "          writer.write(tf_example.SerializeToString())\n",
        "      except ValueError:\n",
        "        print('Invalid example, ignoring.')\n",
        "        pass\n",
        "      except IOError:\n",
        "        print(\"Can't read example, ignoring.\")\n",
        "        pass\n",
        "\n",
        "with open(ANNOTATIONS_JSON_PATH) as f:\n",
        "  annotations = json.load(f)['annotations']\n",
        "  image_files = [image for image in annotations.keys()]\n",
        "  # Load the label map we created.\n",
        "  label_map = label_map_util.get_label_map_dict(LABEL_MAP_PATH)\n",
        "\n",
        "  random.seed(42)\n",
        "  random.shuffle(image_files)\n",
        "  num_train = int(0.7 * len(image_files))\n",
        "  train_examples = image_files[:num_train]\n",
        "  val_examples = image_files[num_train:]\n",
        "\n",
        "  create_tf_record(train_examples, annotations, label_map, CLOUD_ANNOTATIONS_MOUNT, TRAIN_RECORD_PATH)\n",
        "  create_tf_record(val_examples, annotations, label_map, CLOUD_ANNOTATIONS_MOUNT, VAL_RECORD_PATH)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "n6DhYpAS7gX2"
      },
      "source": [
        "# Download a base model\n",
        "Training a model from scratch can take days and tons of data. We can mitigate this by using a pretrained model checkpoint. Instead of starting from nothing, we can add to what was already learned with our own data.\n",
        "\n",
        "There are several pretrained model checkpoints that can be downloaded from the [model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md).\n",
        "\n",
        "The model we will be training is the SSD MobileNet architecture. SSD MobileNet models have a very small file size and can execute very quickly, compromising little accuracy, which makes it perfect for running in the browser. Additionally, we will be using `quantization`. When we say the model is `quantized` it means instead of using `float32` as the datatype of our numbers we are using `float16` or `int8`.\n",
        "\n",
        "- `float32(PI)` = `3.1415927` 32 bits\n",
        "- `float16(PI)` = `3.14` 16 bits\n",
        "- `int8(PI)` = `3` 8 bits\n",
        "\n",
        "We do this because it can cut our model size down by around a factor of 4! An unquantized version of SSD MobileNet that I trained was `22.3 MB`, but the quantized version was `5.7 MB` that's a `~75%` reduction 🎉"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "oHD1Jm0v7jfz"
      },
      "source": [
        "import os\n",
        "import tarfile\n",
        "\n",
        "import six.moves.urllib as urllib\n",
        "\n",
        "download_base = 'http://download.tensorflow.org/models/object_detection/'\n",
        "model = MODEL_TYPE + '.tar.gz'\n",
        "tmp = '/content/checkpoint.tar.gz'\n",
        "\n",
        "if not (os.path.exists(CHECKPOINT_PATH)):\n",
        "  # Download the checkpoint\n",
        "  opener = urllib.request.URLopener()\n",
        "  opener.retrieve(download_base + model, tmp)\n",
        "\n",
        "  # Extract all the `model.ckpt` files.\n",
        "  with tarfile.open(tmp) as tar:\n",
        "    for member in tar.getmembers():\n",
        "      member.name = os.path.basename(member.name)\n",
        "      if 'model.ckpt' in member.name:\n",
        "        tar.extract(member, path=CHECKPOINT_PATH)\n",
        "\n",
        "  os.remove(tmp)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "UXlvFvwUHrui"
      },
      "source": [
        "# Model config\n",
        "The final thing we need to do is inject our pipline with the amount of labels we have and where to find the label map, TFRecord and model checkpoint. We also need to change the the batch size, because the default batch size of 128 is too large for Colab to handle."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "C8CVExv6HsJS"
      },
      "source": [
        "import re\n",
        "\n",
        "from google.protobuf import text_format\n",
        "\n",
        "from object_detection.utils import config_util\n",
        "from object_detection.utils import label_map_util\n",
        "\n",
        "pipeline_skeleton = '/content/models/research/object_detection/samples/configs/' + CONFIG_TYPE + '.config'\n",
        "configs = config_util.get_configs_from_pipeline_file(pipeline_skeleton)\n",
        "\n",
        "label_map = label_map_util.get_label_map_dict(LABEL_MAP_PATH)\n",
        "num_classes = len(label_map.keys())\n",
        "meta_arch = configs[\"model\"].WhichOneof(\"model\")\n",
        "\n",
        "override_dict = {\n",
        "  'model.{}.num_classes'.format(meta_arch): num_classes,\n",
        "  'train_config.batch_size': 24,\n",
        "  'train_input_path': TRAIN_RECORD_PATH,\n",
        "  'eval_input_path': VAL_RECORD_PATH,\n",
        "  'train_config.fine_tune_checkpoint': os.path.join(CHECKPOINT_PATH, 'model.ckpt'),\n",
        "  'label_map_path': LABEL_MAP_PATH\n",
        "}\n",
        "\n",
        "configs = config_util.merge_external_params_with_configs(configs, kwargs_dict=override_dict)\n",
        "pipeline_config = config_util.create_pipeline_proto_from_configs(configs)\n",
        "config_util.save_pipeline_config(pipeline_config, DATA_PATH)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FNYIZK1xVNAa"
      },
      "source": [
        "# Start training\n",
        "We can start a training run by calling the `model_main` script, passing:\n",
        "- The location of the `pipepline.config` we created\n",
        "- Where we want to save the model\n",
        "- How many steps we want to train the model (the longer you train, the more potential there is to learn)\n",
        "- The number of evaluation steps (or how often to test the model) gives us an idea of how well the model is doing"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Wv5h2bwBVO0V"
      },
      "source": [
        "!rm -rf $OUTPUT_PATH\n",
        "!python -m object_detection.model_main \\\n",
        "    --pipeline_config_path=$DATA_PATH/pipeline.config \\\n",
        "    --model_dir=$OUTPUT_PATH \\\n",
        "    --num_train_steps=$NUM_TRAIN_STEPS \\\n",
        "    --num_eval_steps=100"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AwNtvgtdoB-C"
      },
      "source": [
        "# Export inference graph\n",
        "After your model has been trained, you might have a few checkpoints available. A checkpoint is usually emitted every 500 training steps. Each checkpoint is a snapshot of your model at that point in training. In the event that a long running training process crashes, you can pick up at the last checkpoint instead of starting from scratch.\n",
        "\n",
        "We need to export a checkpoint to a TensorFlow graph proto in order to actually use it. We use regex to find the checkpoint with the highest training step and export it."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BZgP_FZUoE0d"
      },
      "source": [
        "import os\n",
        "import re\n",
        "\n",
        "regex = re.compile(r\"model\\.ckpt-([0-9]+)\\.index\")\n",
        "numbers = [int(regex.search(f).group(1)) for f in os.listdir(OUTPUT_PATH) if regex.search(f)]\n",
        "TRAINED_CHECKPOINT_PREFIX = os.path.join(OUTPUT_PATH, 'model.ckpt-{}'.format(max(numbers)))\n",
        "\n",
        "print(f'Using {TRAINED_CHECKPOINT_PREFIX}')\n",
        "\n",
        "!rm -rf $EXPORTED_PATH\n",
        "!python -m object_detection.export_inference_graph \\\n",
        "  --pipeline_config_path=$DATA_PATH/pipeline.config \\\n",
        "  --trained_checkpoint_prefix=$TRAINED_CHECKPOINT_PREFIX \\\n",
        "  --output_directory=$EXPORTED_PATH"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NfIiNcb0OWk6"
      },
      "source": [
        "# Testing the model\n",
        "Let's test to see if our model is working. We only trained for 500 steps so the model's predictions might appear random or not even predict a box. Try taking a few photos. The results will look a lot better when we can try the model out on a real-time video stream later on.\n",
        "\n",
        "> **Homework:** Try training the model for 5,000 steps and see how the accuracy changes. Play around with other model formats, like the non-quantized version of SSD MobileNet V1 or Faster RCNN."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "HWHvlnyjmCIN"
      },
      "source": [
        "from IPython.display import display, Javascript, Image\n",
        "from google.colab.output import eval_js\n",
        "from base64 import b64decode\n",
        "\n",
        "# Use javascipt to take a photo.\n",
        "def take_photo(filename, quality=0.8):\n",
        "  js = Javascript('''\n",
        "    async function takePhoto(quality) {\n",
        "      const div = document.createElement('div');\n",
        "      const capture = document.createElement('button');\n",
        "      capture.textContent = 'Capture';\n",
        "      div.appendChild(capture);\n",
        "\n",
        "      const video = document.createElement('video');\n",
        "      video.style.display = 'block';\n",
        "      const stream = await navigator.mediaDevices.getUserMedia({video: true});\n",
        "\n",
        "      document.body.appendChild(div);\n",
        "      div.appendChild(video);\n",
        "      video.srcObject = stream;\n",
        "      await video.play();\n",
        "\n",
        "      // Resize the output to fit the video element.\n",
        "      google.colab.output.setIframeHeight(document.documentElement.scrollHeight, true);\n",
        "\n",
        "      // Wait for Capture to be clicked.\n",
        "      await new Promise((resolve) => capture.onclick = resolve);\n",
        "\n",
        "      const canvas = document.createElement('canvas');\n",
        "      canvas.width = video.videoWidth;\n",
        "      canvas.height = video.videoHeight;\n",
        "      canvas.getContext('2d').drawImage(video, 0, 0);\n",
        "      stream.getVideoTracks()[0].stop();\n",
        "      div.remove();\n",
        "      return canvas.toDataURL('image/jpeg', quality);\n",
        "    }\n",
        "    ''')\n",
        "  display(js)\n",
        "  data = eval_js('takePhoto({})'.format(quality))\n",
        "  binary = b64decode(data.split(',')[1])\n",
        "  with open(filename, 'wb') as f:\n",
        "    f.write(binary)\n",
        "  return filename\n",
        "\n",
        "try:\n",
        "  take_photo('/content/photo.jpg')\n",
        "except Exception as err:\n",
        "  # Errors will be thrown if the user does not have a webcam or if they do not\n",
        "  # grant the page permission to access it.\n",
        "  print(str(err))\n",
        "\n",
        "# Use the captured photo to make predictions\n",
        "%matplotlib inline\n",
        "\n",
        "import os\n",
        "import numpy as np\n",
        "from matplotlib import pyplot as plt\n",
        "from PIL import Image as PImage\n",
        "from object_detection.utils import visualization_utils as vis_util\n",
        "from object_detection.utils import label_map_util\n",
        "\n",
        "# Load the labels\n",
        "category_index = label_map_util.create_category_index_from_labelmap(LABEL_MAP_PATH, use_display_name=True)\n",
        "\n",
        "# Load the model\n",
        "path_to_frozen_graph = os.path.join(EXPORTED_PATH, 'frozen_inference_graph.pb')\n",
        "detection_graph = tf.Graph()\n",
        "with detection_graph.as_default():\n",
        "  od_graph_def = tf.GraphDef()\n",
        "  with tf.gfile.GFile(path_to_frozen_graph, 'rb') as fid:\n",
        "    serialized_graph = fid.read()\n",
        "    od_graph_def.ParseFromString(serialized_graph)\n",
        "    tf.import_graph_def(od_graph_def, name='')\n",
        "\n",
        "with detection_graph.as_default():\n",
        "  with tf.Session(graph=detection_graph) as sess:\n",
        "    # Definite input and output Tensors for detection_graph\n",
        "    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')\n",
        "    # Each box represents a part of the image where a particular object was detected.\n",
        "    detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')\n",
        "    # Each score represent how level of confidence for each of the objects.\n",
        "    # Score is shown on the result image, together with the class label.\n",
        "    detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')\n",
        "    detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')\n",
        "    num_detections = detection_graph.get_tensor_by_name('num_detections:0')\n",
        "    image = PImage.open('/content/photo.jpg')\n",
        "    # the array based representation of the image will be used later in order to prepare the\n",
        "    # result image with boxes and labels on it.\n",
        "    (im_width, im_height) = image.size\n",
        "    image_np = np.array(image.getdata()).reshape((im_height, im_width, 3)).astype(np.uint8)\n",
        "    # Expand dimensions since the model expects images to have shape: [1, None, None, 3]\n",
        "    image_np_expanded = np.expand_dims(image_np, axis=0)\n",
        "    # Actual detection.\n",
        "    (boxes, scores, classes, num) = sess.run(\n",
        "        [detection_boxes, detection_scores, detection_classes, num_detections],\n",
        "        feed_dict={image_tensor: image_np_expanded})\n",
        "    # Visualization of the results of a detection.\n",
        "    vis_util.visualize_boxes_and_labels_on_image_array(\n",
        "        image_np,\n",
        "        np.squeeze(boxes),\n",
        "        np.squeeze(classes).astype(np.int32),\n",
        "        np.squeeze(scores),\n",
        "        category_index,\n",
        "        use_normalized_coordinates=True,\n",
        "        line_thickness=8)\n",
        "    plt.figure(figsize=(12, 8))\n",
        "    plt.imshow(image_np)"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "yVmCc9NPkP3U"
      },
      "source": [
        "# Convert the model\n",
        "Once we have have the TensorFlow graph proto we can use it in Python, but we are going to take it a step further and convert the model to TensorFlow.js so we can use it directly in the browser.\n",
        "\n",
        "The model only detects objects as the IDs in out `label_map.pbtxt` so we also need to create a json list of all of our labels so we can map the ID back to a label."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "A-f5lfcnp01e"
      },
      "source": [
        "!tensorflowjs_converter \\\n",
        "  --input_format=tf_frozen_model \\\n",
        "  --output_format=tfjs_graph_model \\\n",
        "  --output_node_names='Postprocessor/ExpandDims_1,Postprocessor/Slice' \\\n",
        "  --quantization_bytes=1 \\\n",
        "  --skip_op_check \\\n",
        "  $EXPORTED_PATH/frozen_inference_graph.pb \\\n",
        "  /content/model_web\n",
        "\n",
        "import json\n",
        "\n",
        "from object_detection.utils.label_map_util import get_label_map_dict\n",
        "\n",
        "label_map = get_label_map_dict(LABEL_MAP_PATH)\n",
        "label_array = [k for k in sorted(label_map, key=label_map.get)]\n",
        "\n",
        "with open(os.path.join('/content/model_web', 'labels.json'), 'w') as f:\n",
        "  json.dump(label_array, f)\n",
        "\n",
        "!cd /content/model_web && zip -r /content/model_web.zip *"
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "q1iFI9pPr1l7"
      },
      "source": [
        "# Download the model\n",
        "> **Note:** Sometimes this command doesn't run or it will throw an error. Just try running it again.\n",
        "\n",
        "You can also download the model by right clicking on the `model_web.zip` file in the left sidebar file inspector."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "FL_miSj2r1yt"
      },
      "source": [
        "from google.colab import files\n",
        "files.download('/content/model_web.zip') "
      ],
      "execution_count": null,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "P3WHwIZ0QlVP"
      },
      "source": [
        "# Next steps / Using the model\n",
        "Use the `model_web` folder generated here, with this [project](https://github.com/cloud-annotations/object-detection-react) to create a real-time webcam object detection app."
      ]
    }
  ]
}